Safe Learning for Near-Optimal Scheduling

نویسندگان

چکیده

In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have $$10^{20}$$ states beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees learning model. Additionally, extend Monte-Carlo tree search advice, computed using safety games or obtained earliest-deadline-first scheduler, safely explore learned model online. Finally, implemented compared our empirically against shielded deep Q-learning on large systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Near-optimal Regret Bounds for Reinforcement Learning Near-optimal Regret Bounds for Reinforcement Learning

This technical report is an extended version of [1]. For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s there is a policy which moves from s to s i...

متن کامل

Near-Optimal Course Scheduling at the Technion

The focus of this article is the automation of course, classroom, and exam scheduling for the faculty of Industrial Engineering (IE) at the Technion in Haifa, Israel. The system, called the Technion Industrial Engineering Scheduler (TieSched), has been operational since 2012. It is based on a distributed collection of constraints and multiple engines running in parallel, including SAT, pseudo-B...

متن کامل

Near-optimal Regret Bounds for Reinforcement Learning

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D steps (on average). We present a reinfo...

متن کامل

Near optimal algorithms for scheduling independent chains in BSP

The aim of this work is to show that scheduling a set of independent chains on a parallel machine under the BSP model is a difficult optimization problem which can be easily approximated in practice. BSP is a machine independent computational model which is becoming more and more popular [7]. Finding the optimal solution when the number of processors is fixed is shown to be hard. Efficient heur...

متن کامل

Near-Optimal Scheduling for LTL with Future Discounting

We study synthesis of optimal schedulers for the linear temporal logic (LTL) with future discounting. The logic, introduced by Almagor, Boker and Kupferman, is a quantitative variant of LTL in which an event in the far future has only discounted contribution to a truth value (that is a real number in the unit interval [0, 1]). The precise problem we study—it naturally arises e.g. in search for ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-85172-9_13